Reducing computation in an i-vector speaker recognition system using a tree-structured universal background model
نویسندگان
چکیده
The majority of state-of-the-art speaker recognition systems (SR) utilize speaker models that are derived from an adapted universal background model (UBM) in the form of a Gaussian mixture model (GMM). This is true for GMM supervector systems, joint factor analysis systems, and most recently i-vector systems. In all of these systems, the posterior probabilities and sufficient statistics calculations represent a computational bottleneck in both enrollment and testing. We propose a multi-layered hash system, employing a tree-structured GMM–UBM which uses Runnalls’ Gaussian mixture reduction technique, in order to reduce the number of these calculations. With this tree-structured hash, we can trade-off reduction in computation with a corresponding degradation of equal error rate (EER). As an example, we reduce this computation by a factor of 15 while incurring less than 10% relative degradation of EER (or 0.3% absolute EER) when evaluated with NIST 2010 speaker recognition evaluation (SRE) telephone data. 2014 Elsevier B.V. All rights reserved.
منابع مشابه
Towards a more efficient SVM supervector speaker verification system using Gaussian reduction and a tree-structured hash
Speaker verification (SV) systems that employ maximum a posteriori (MAP) adaptation of a Gaussian mixture model (GMM) universal background model (UBM) incur a significant teststage computational load in the calculation of a posteriori probabilities and sufficient statistics. We propose a multi-layered hash system employing a tree-structured GMM which uses Runnalls’ GMM reduction technique. The ...
متن کاملA tree-based kernel selection approach to efficient Gaussian mixture model-universal background model based speaker identification
We propose a tree-based kernel selection (TBKS) algorithm as a computationally efficient approach to the Gaussian mixture model–universal background model (GMM–UBM) based speaker identification. All Gaussian components in the universal background model are first clustered hierarchically into a tree and the corresponding acoustic space is mapped into structurally partitioned regions. When identi...
متن کاملمقایسه روش های طیفی برای شناسایی زبان گفتاری
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...
متن کاملUsing Vector Quantization for Universal Background Model in Automatic Speaker Verification
We aim to describe different approaches for vector quantization in Automatic Speaker Verification. We designed our novel architecture based on multiples codebook representing the speakers and the impostor model called universal background model and compared it to another vector quantization approach used for reducing training data. We compared our scheme with the baseline system, Gaussian Mixtu...
متن کاملFast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework
Anchor modeling technique has been shown to be useful in reducing computational complexity for speaker identification and indexing of large audio database. In this technique, speakers are projected onto a talker space spanned by a set of predefined anchor models which are usually represented by Gaussian Mixture Models (GMMs). The characterization of each speaker involves calculation of likeliho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Speech Communication
دوره 66 شماره
صفحات -
تاریخ انتشار 2015